Using Distributed Balanced Trees Over DHTs for Building Large-scale Indexes
نویسندگان
چکیده
DHT systems are structured overlay networks capable of using P2P resources as a scalable storage platform for very large data applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow non-uniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this
منابع مشابه
A Tabu-Based Cache to Improve Range Queries on Prefix Trees
Distributed Hash Tables (DHTs) provide the substrate to build large scale distributed applications over Peerto-Peer networks. A major limitation of DHTs is that they only support exact-match queries. In order to offer range queries over a DHT it is necessary to build additional indexing structures. Prefix-based indexes, such as Prefix Hash Tree (PHT), are interesting approaches for building dis...
متن کاملBuilding Inverted Indexes Using Balanced Trees Over DHT Systems
Objects containing the document locations for popular keywords are sufficiently large to create storage hotspots at some hosts. Since each object is assigned to a single key, DHT key based load balancing techniques are incapable of splitting the object through several hosts. Furthermore, caching techniques only reduce network load for query operations and not handling network load during insert...
متن کاملBuilding an Internet-Scale Service For Publishing and Locating XML Documents on PlanetLab
In recent years, there has been a growing interest for peer-to-peer (P2P) based computing and applications. One of the important challenges in P2P environments is to quickly locate relevant data across many participating peers. In this regard, Distributed Hash Tables (DHTs) are a popular solution for building large scale distributed applications due to their scalability, load balancing and faul...
متن کاملA combination of DHTs and Peer Clustering for Distributed Information Retrieval
Distributed Hash Tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, building huge term indexes, as required for IR-style keyword search, are impractical with plain DHTs. Due to the large sizes of document term vocabularies, joining peers cause huge amounts of key inserts, and subsequently larg...
متن کاملdFault: Fault Localization in Large-Scale Peer-to-Peer Systems
Distributed hash tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more important as missioncritical applications begin to be layered on them. Even though DHTs can detect and heal around unresponsive hosts and disconnected links, several hidden faults and performance bottlenecks go undetecte...
متن کامل